Regularized sequence-level deep neural network model adaptation

نویسندگان

  • Yan Huang
  • Yifan Gong
چکیده

We propose a regularized sequence-level (SEQ) deep neural network (DNN) model adaptation methodology as an extension of the previous KL-divergence regularized cross-entropy (CE) adaptation [1]. In this approach, the negative KL-divergence between the baseline and the adapted model is added to the maximum mutual information (MMI) as regularization in the sequence-level adaptation. We compared eight different adaptation setups specified by the baseline training criterion, the adaptation criterion, and the regularization methodology. We found that the proposed sequence-level adaptation consistently outperforms the crossentropy adaptation. For both of them, regularization is critical. We further introduced a unified formulation in which the regularized CE and SEQ adaptation are the special cases. We applied the proposed approach to speaker adaptation and accent adaptation in a mobile short message dictation task. For the speaker adaptation, with 25 or 100 utterances, the proposed approach yields 13.72% or 23.18% WER reduction when adapting from the CE baseline, comparing to 11.87% or 20.18% for the CE adaptation. For the accent adaptation, with 1K utterances, the proposed approach yields 18.74% or 19.50% WER reduction when adapting from the CE-DNN or the SEQ-DNN. The WER reduction using the regularized CE adaptation is 15.98% and 15.69%, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation

We propose a multi-accent deep neural network acoustic model with an accent-specific top layer and shared bottom hidden layers. The accent-specific top layer is used to model the distinct accent specific patterns. The shared bottom hidden layers allow maximum knowledge sharing between the native and the accent models. This design is particularly attractive when considering deploying such a syst...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Quantitative Structure-Activity Relationship Study on Thiosemicarbazone Derivatives as Antitubercular agents Using Artificial Neural Network and Multiple Linear Regression

Background and purpose: Nonlinear analysis methods for quantitative structure–activity relationship (QSAR) studies better describe molecular behaviors, than linear analysis. Artificial neural networks are mathematical models and algorithms which imitate the information process and learning of human brain. Some S-alkyl derivatives of thiosemicarbazone are shown to be beneficial in prevention and...

متن کامل

Deep Reinforcement Learning with Regularized Convolutional Neural Fitted Q Iteration

We review the deep reinforcement learning setting, in which an agent receiving high-dimensional input from an environment learns a control policy without supervision using multilayer neural networks. We then extend the Neural Fitted Q Iteration value-based reinforcement learning algorithm (Riedmiller et al) by introducing a novel variation which we call Regularized Convolutional Neural Fitted Q...

متن کامل

The UEDIN English ASR System for the IWSLT 2013 Evaluation

This paper describes the University of Edinburgh (UEDIN) English ASR system for the IWSLT 2013 Evaluation. Notable features of the system include deep neural network acoustic models in both tandem and hybrid configuration, cross-domain adaptation with multi-level adaptive networks, and the use of a recurrent neural network language model. Improvements to our system since the 2012 evaluation – w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015